Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

Yin-Jyun Luo$^{1, 3}$, Chin-Cheng Hsu$^{2}$, Kat Agres$^{3,4}$, Dorien Herremans$^{1,3}$

$^{1}$Singapore University of Technology and Design
$^{2}$University of Southern California
$^{3}$Institute of High Performance Computing, A*STAR, Singapore
$^{4}$Yong Siew Toh Conservatory of Music, National University of Singapore
$\tt yinjyun\_luo@mymail.sutd.edu.sg$

Many-to-Many Singing Technique Conversion

We focus on singing technique conversion in this demo page.
The following are the audio samples of Fig. 2(b) in the paper.

The audio files below are all converted from Mel-spectrograms using Griffin-Lim.
Therefore, the audio "original Mel-spectrogram" are the upper bounds of the audio quality for each conversion.

We first give examples of each singing technique

Again, these samples are obtained by inverting from their Mel-spectrograms to audio.

Notice that singers could express a vocal technique distinctly, and also sing at different levels of expression at different time instants, which raises ambiguities on defining vocal techniques and poses challenges to data-driven models.

straight
f1
m9
f1
belt
m6
f7
m3
breathy
f6
f4
f8
lip trill
f9
m8
f9
vibrato
m1
f7
m11
vocal fry
m3
m2
f3

The first row of Fig. 2(b) in the paper

m8 lip trill
Original Mel-spectrogram
Reconstructed Mel-spectrogram
Convert to straight
Convert to belt
Convert to breathy
Convert to vibrato
Convert to vocal fry

The second row of Fig. 2(b) in the paper

m9 straight
Original Mel-spectrogram
Reconstructed Mel-spectrogram
Convert to belt
Convert to breathy
Convert to lip trill
Convert to vibrato
Convert to vocal fry

Addtional samples for singing technique conversion

f4 breathy
Original Mel-spectrogram
Reconstructed Mel-spectrogram
Convert to belt
Convert to lip trill
Convert to vibrato
Convert to vocal fry

We also provide samples for many-to-many singer conversion.

The first row of Fig. 2(a) in the paper

m3 belt
Original Mel-spectrogram
Reconstructed Mel-spectrogram
Convert to f1
Convert to f2
Convert to f4
Convert to m4
Convert to m6

The second row of Fig. 2(a) in the paper

f4 breathy
Original Mel-spectrogram
Reconstructed Mel-spectrogram
Convert to f1
Convert to f2
Convert to m3
Convert to m4
Convert to m6

Addtional samples for singer connversion

f2 straight
Original Mel-spectrogram
Reconstructed Mel-spectrogram
Convert to f1
Convert to f2
Convert to m3
Convert to m4
Convert to m6